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Abstract 


h'  ^ 


the  one-sided  hypothesis  testing  problem' it  is  shown  that  it  is 

I 

possible  to  reconcile  Bayesian  evidence  against  expressed  in  terms 
of  the  poste.tMx)r  probability  that  is  true,  with  frequentist  evidence 
against  Hj^,  expressed  in  terms  of  the  p-value.  In  fact,  for  many  classes 
of  prior  distributions  it  is  shown  that  the  infimum  of  the  Bayesian 
posterior  probability  of  (^1s  either  equal  to  or  bounded  above  by  the 
p-value.  The  results  are  in  direct  contrast  to  recent  work  of  Berger 
and  Sellke  (1985)  in  the  two-sided  (point  null)  case,  where  it  was  found 
that  the  p-value  is  much  less  than  the  Bayesian  infimum.  Some  comments 
on  the  point  null  problem  are  given. 


I.  Introduction 

In  the  problem  of  hypothesis  testing,  'evidence'  can  be  thought  of  as 
a  post-experimental  (data-based)  evaluation  of  the  tenability  of  the 
null  hypothesis,  Hq.  To  a  Bayesian,  evidence  takes  the  form  of  the 
posterior  probability  that  Hq  is  true,  while  to  a  frequentist,  evidence 
takes  the  form  of  the  p-value,  or  significance  level,  of  the  test.  If 
the  null  hypothesis  consists  of  a  single  point,  it  has  long  been  known 
that  these  two  measures  of  evidence  can  greatly  differ.  The  famous 
paper  of  Lindley  (1957)  illustrates  the  possible  discrepancy  in  the 
normal  case. 

The  question  of  reconciling  these  two  measures  of  evidence  has  been 
treated  in  the  literature.  For  the  most  part,  the  two-sided  (point  null) 
problem  has  been  treated,  and  the  major  conclusion  has  been  that  the 
p-value  tends  to  overstate  the  evidence  against  Hq  (that  is,  the  p-value 
tends  to  be  smaller  than  a  Bayesian  posterior  probability).  Many  references 
can  be  found  in  Shafer  (1982).  However  Pratt  (1965)  does  state  that  in  the 
one-sided  testing  problem,  the  p-value  is  approximately  equal  to  the 
posterior  probability  of  Hq. 

A  slightly  different  approach  to  the  problem  of  reconciling  evidence 
was  taken  by  DeGroot  (1973).  Working  in  a  fairly  general  setting,  DeGroot 
constructs  alternative  distributions  and  finds  improper  priors  for  which 
the  p-value  and  posterior  probability  match.  DeGroot  assumes  that  the 
alternative  distributions  are  stochastically  ordered  which,  although  he  does 
not  explicitly  state  it,  essentially  puts  him  in  the  one-sided  testing 


Dickey  (1977),  in  the  two-sided  problem,  considers  classes  of  priors, 
and  examines  the  infimum  of  Bayesian  evidence  against  Hq.  As  a  measure  of 
Bayesian  evidence  Dickey  uses  the  "Bayes  factor,"  which  is  closely  related 
to  the  posterior  probability  of  Hq.  He  also  concludes  that  the  p-value 
overstates  the  evidence  against  Hq,  even  when  compared  to  the  infimum  of 
Bayesian  evidence. 

A  recent  paper  by  J.  Berger  and  T.  Sellke  (1985)  has  approached  the 
problem  of  reconciling  evidence  in  a  manner  similar  to  Dickey's  approach. 

For  the  Bayesian  measure  of  evidence  they  consider  the  infimum,  over  a 
class  of  priors,  of  the  posterior  probability  that  Hq  is  true.  For  many 
classes  of  priors  it  turns  out  that  this  infimum  is  much  greater  than  the 
frequentist  p-value,  leading  Berger  and  Sellke  to  conclude  that,  "... 
significance  levels  can  be  highly  misleading  measures  of  the  evidence 
provided  by  the  data  against  the  null  hypothesis." 

Although  their  arguments  are  compelling,  and  may  lead  one  to  question 
the  worth  of  p-values,  their  analyses  are  restricted  to  the  problem  of 
testing  a  point  null  hypothesis.  If,  in  fact,  the  p-value  is  a  misleading 
measure  of  evidence,  discrepancies  with  Bayesian  measures  should  emerge 
in  other  hypothesis  testing  situations. 

The  point  null  hypothesis  is  perhaps  the  most  used  and  misused 
statistical  technique.  In  particular,  in  the  location  parameter  problem, 
the  point  null  hypothesis  is  more  the  mathematical  convenience  rather 
than  the  statistical  method  of  choice.  Few  experimenters,  of  whom  we  are 
aware,  want  to  conclude  "there  is  a  difference."  Rather,  they  are  looking 
to  conclude  "the  new  treatment  is  better."  Thus,  for  the  most  part,  there 
is  a  direction  of  interest  in  almost  any  experiment,  and  saddling  an 
experimenter  with  a  two-sided  test  will  not  lead  to  the  desired  conclusions. 


In  this  paper  we  consider  the  problem  of  reconciling  evidence  in  the 
one-sided  testing  problem.  We  find,  in  direct  contrast  to  the  results  of 
Berger  and  Sellke,  that  evidence  can  be  reconciled.  That  is,  for  many 
classes  of  priors,  the  infimum  of  the  Bayes  posterior  probability  that  Hg 
is  true  is  either  equal  to  or  bounded  above  by  the  p-value. 

In  Section  2  we  present  some  necessary  preliminaries,  including  the 
classes  of  priors  we  are  considering  and  how  they  relate  to  those  considered 
in  the  two-sided  problem.  Section  3  contains  the  main  results  concerning 
the  relationship  between  Bayesian  and  frequentist  evidence.  Section  4 
considers  classes  of  priors  that  are  biased  toward  Hg,  and  Section  5 
contains  comments  about  testing  a  point  null  hypothesis. 


2.  Preliminaries 


We  consider  testing  the  hypotheses 

Hq:  6<0  (2.1) 

vs. 

:  0  >  0 

based  on  observing  X  =  x,  where  X  has  location  density  f(x-0).  Throughout 
this  paper,  unless  explicitly  stated,  we  assume  that 

i)  f(*)  is  symmetric  about  zero 
ii)  f(x  -  0)  has  monotone  likelihood  ratio  (mlr). 


Recall  that  i)  and  ii)  imply  that  f(*)  is  unimodal. 

If  X  =  X  is  observed,  a  frequentist  measure  of  evidence  against  Hq  is 
given  by  the  p- value 


p(x)  =  P(X  ^  x|0  =  0) 


P  f{t)dt  . 


A  Bayesian  measure  of  evidence,  given  a  prior  distribution  tt(0), 
probability  that  Hq  is  true  given  X  =  x. 


x)  =  P(0  £  0|x)  = 


(0 

f(x-0)TT(0)d0 

J  *00 


f  {X-0)TT(0)d0 


(2.2) 
is  the 


(2.3) 


Our  major  point  of  concern  is  whether  these  two  measures  of  evidence 
can  be  reconciled,  that  is,  can  the  p-value,  in  some  sense,  be  regarded  as 
a  Bayesian  measure  of  evidence.  Since  the  p-value  is  based  on  the  objective 
frequentist  model,  it  seems  apparent  that,  if  reconciliation  is  possible, 
it  must  be  based  on  impartial  prior  distributions.  By  impartial  we  mean 


that  the  prior  distribution  gives  equal  weight  to  both  the  null  and 
alternative  hypotheses. 

Four  reasonable  classes  of  distributions  are  given  by 

=  {all  distributions  giving  mass  j  to  (-“,0]  and  (0,»)} 

Gc  =  {all  distributions  symmetric  about  zero} 

^  (2.4) 

Ss  ~  unimodal  distributions  symmetric  about  zero} 

2  2 

GN0[^={all  normal  (0,t  )  distributions,  0  <  «}. 

For  any  class  of  priors,  we  can  obtain  a  reasonably  objective  Bayesian 
measure  of  evidence  by  considering  inf  P{Hq|x),  where  the  infimum  is  taken 
over  a  chosen  class  of  priors.  We  can  then  examine  the  relationship  between 
this  infimum  and  p(x).  If  there  is  agreement,  we  can  conclude  that  Bayesian 
and  frequentist  evidence  can  be  reconciled. 

This  development  is,  of  course,  similar  to  that  of  Berger  and  Sellke 
(1985),  who  consider  the  two-sided  hypothesis  test  Hq:  0=0  vs.  9  ^  0, 
using  priors  of  the  form 

(■  ttq  if  9  =  0 

"  '  (l-TTQ)g(9)  if  9  0  , 

and  allow  g(*)  to  vary  within  a  class  of  distributions,  similar  to  the  classes 
in  (2.4).  For  any  numerical  calculations  they  choose  ttq  =  asserting  that 
this  provides  an  impartial  prior  distribution.  We  will  return  to  this 
question  in  Section  5. 

For  testing  9^0  vs.  9  >  0,  we  will  mainly  be  concerned  with 
evidence  based  on  observing  x  >  0.  For  x  <  0,  p(x)  >  and  inf  P(Hq|x)  =  j, 
where  the  infimum  is  over  any  of  the  classes  in  (2.4).  Thus,  if  x  <  0,  neither 

a  frequentist  nor  a  Bayesian  would  consider  the  data  as  having  evidence  against 
Hq,  so  ther  is,  in  essence,  nothing  to  be  reconciled. 
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3.  Syrnnetric  Prior  Distributions 

In  this  section  we  consider  prior  distributions  contained  in  the  classes 
given  in  (2.4).  Our  goal  is  to  calculate  inf  P(HqIx)  for  each  of  these 
classes,  and  relate  the  answer  to  p(x).  In  some  cases  we  do  not  calculate 
inf  P(Hg|x)  exactly,  but  rather  obtain  an  upper  bound  on  the  infimum.  This 
is  accomplished  by  calculating  the  infimum  exactly  for  smaller  classes  of 
distributions. 

For  the  one-sided  testing  problem,  the  class  is  too  large  to  be 
of  use,  as  the  following  theorem  shows. 


Theorem  3.1:  For  the  hypotheses  in  (2.1),  if  x  >  0,  then 


inf  P(H^[x)  =  0  . 


Proof;  Consider  a  sequence  of  priors 


Tr,^(e)  = 


'  i  if  9  =  -k 

' 

,  g(9)  if  0  >  0 


where 


“  1 
g(,9)de  =  j.  Then 


P(Holx)  = 


f(x  +  k) 


f(x+k.)  + 


f(x-0)g(e)de 


» 


and  it  is  easy  to  see  that  lim  P(Hf,|x)  =  0,  establishing  the  result.  □ 

Although  we  cannot  obtain  explicit  answers  for  the  class  G^,  we  can  get 
some  interesting  results  for  the  smaller  class  contained  in  G^, 


^2PS  ~  two-point  distributions  symmetric  about  0). 


8 


h 


Let  x-j  =  k-x,  X2  =  k+x,  0^  =  0,  and  82  =  k-z.  The  integrand  is  of  the 

form  f  (x-j-ei  )f  (x2-e2)  -  f(x2-0])f  (x^-62).  Since  x  >  0,  X2  >  x^ ,  and 

®2  —  ^  —  the  fact  that  f  has  mir  implies 

that  the  integrand  is  nonnegative  if  z  ^  k  and  nonpositive  if  z  >  k. 

It  also  follows  from  the  assumptions  on  f  that  lim  f(k-x)  =  lim  f(k+x)  exists 

k->«>  k-x» 

and  equals  zero.  For  z  >  k,  f(k-x)f(x+z)  -  f (k+x)f (-x+z)  £  0  so 


n 

k 

k 

t- 

b* 

r;- 

i' 

!'■ 


]f (k-x)f(x+z)  -  f (k+x)f (-x+z) I  <_  f (k+x)f (-x+z) £f (x)f (-x+z) ,  (3.6) 

the  last  inequality  following  since  f  is  unimodal  and  x  >  0.  Thus,  by  the 
dominated  convergence  theorem, 

<X5 

lim  [f (k-x)f (x+z)  -  f (k+x)f (-x+z)]dz  =  0  .  (3.7) 

k->«  k 

hence. 


lim[f(k-x)p(x)  -  f(k+x)(l  -  p(x))] 

k-KO 


fk 


=  lim  [f (k-x)f (x+z)  -  f (k+x)f (-x+z)]dz 
k-xo-'O 


>  0  , 


establishing  (3.4)  and  proving  the  theorem.  □ 


(3.8) 


The  inequality  between  inf  P(HqIx)  and  p(x)  is,  in  fact,  strict  in 

TTeG2p5 

in  many  cases.  Table  1  gives  explicit  expressions  for  some  common 
distributions. 

The  Cauchy  distribution,  which  does  not  have  milr,  does  not  attain 

2  1/2 

its  infimum  at  k  =  «  but  rather  at  k  =  (x  +1)  '  .  Even  so,  it  is  still 
the  case  that  the  p-value  is  greater  than  inf  P(HqIx)  for  the  Cauchy 


distribution. 


Table  1. 

P-values  and  inf  P(Hq 
syirmetric  two-point  ^ 

jx)  for  the  class  of 
distributions  (x  >  0) 

Distribution 

p(x) 

inf  P(Hf.lx) 

normal 

double  exponential 
logistic 

Cauchy 


1  -  0(x) 

(1  *  »>■)-’ 


1_  tan~  X 

2  7r 


(1  + 

(1  +  e^'^)''' 

1  +  (x-Cx^+1)^)^ 


^  2  +  (x-(x^+l)'^)^+(x+(xSl)"^)^ 

We  now  turn  to  the  class  of  distributions  Gy^,  where  we  again  obtain  the 
p-value  as  an  upper  bound  on  the  infimum  of  the  Bayesian  evidence.  We  can, 
in  fact,  demonstrate  equality  between  p(x)  and  inf  P(Hq|x)  for  two  classes 
of  distributions  contained  in  We  first  consider 

Ug  =  (all  symmetric  uniform  distributions}. 

Theorem  3.3:  For  the  hypotheses  in  (2.1),  if  x  >  0, 


inf  P(Hf,lx)  =  p(x)  . 

TreUg  ^ 


Proof:  Let  tt(9)  be  uniform  (-k,k).  Then 

fO 

f(x-0)d0 

«/..  f  »  J  -k 


P(Ho|x)  =  - 

f(x-9)d0 


(3.9) 


f (x-0)d0 


f (x-k)+f (x+k 


We  will  now  establish  that  P(Hq1x),  as  a  function  of  k,  has  no  minimum  on 
the  interior  of  Suppose  k  =  satisfies 


dk  ’’^“oI^^Ik=kQ  ^  ° 


(3.11) 


It  is  straight  forward  to  establish  that  the  sign  of  the  second  derivative, 
evaluated  at  k  =  kg,  is  given  by 


^ U=ko  “  dk  f  (x-ki+f  (x+k)  U=kg 


(3.12) 


Since  f  has  m£r,  the  ratio  f(x+k)/f(x-k)  is  decreasing  in  k  for  fixed  x  >  0. 
Therefore,  the  sign  of  (.3.12)  is  always  negative,  so  any  interior  extremum 
can  only  be  a  maximum.  The  minimum  is  obtained  on  the  boundary,  and  it  is 
straightforward  to  check  that 

(•0 

f(x-0)d0  Q 

inf  P(Hf.|x)  *  lim  - -  =  f(x-0)d0  *  p(x)  .  □ 

f'  f(x.6)<l0 
J-k 

A  similar  result  can  be  obtained  for  another  class  of  distributions, 
^MU’  which  consists  of  mixtures  of  symmetric  uniform  distributions.  Let  G 
be  the  set  of  all  densities  g  on  [0,«>)  such  that  the  scale  parameter  family 
{a"^g(k/G),  a>0}  has  mir  in  k.  Define 


•'MU  “ 


:  7r(0)  =  P(2k)”h^_l^^l^j(0)a‘^g(k/a)dk,  gcG,  a>0}  . 


The  class  contains  many  familiar  distributions  symmetric  about  zero, 
including  all  normal  and  t  distributions. 


Theorem  3.4;  For  the  hypotheses  in  (2.1),  if  x  >  0, 


inf  P(Hq|x)  =  p  (x)  . 


(3.13) 


n 


Proof:  Let  TT(0)eGn^y.  By  interchanging  the  order  of  integration  and  using 
the  symmetry  of  f  we  obtain 

(-»  1  r-x 

'0  ■'■-x-k 


P(H(j|x)  = 


r*{2ka)”^g(k/a) f  f(z)dzdk 
Jo _ J-x-k _ 

I  (2ka)”^g(k/a)[  f(z)dzdk 
Jo  J-x-k 


We  first  show  that,  for  fixed  g. 


inf  P(Hrt|x)  =  lim  P(H^!x) 
0<a«»  ™  u 


(3.14) 


(3.15) 


For  rotational  convenience  define 

h(x,a)  =  I  a  ^g(y/a)f(y-x)dy  . 

Jo 

Since  the  denominator  of  (3.14)  has  derivative  equal  to  h(-x,a)+h(x,a)  >  0, 
it  follows  that 


sgn 


=  sgn 


We  now  establish  that  if  P(Hq|x)  has  an  extremum  for  0  <  a  <  ~,  that  extremum 
must  be  a  maximum.  Suppose  that  o  =  Oq  satisfies 


a=ar 


0  . 


Then 


sgn 


sgn 


=  sgn 


do  h(-x,a)+h(x,a)  'a=af 


^  h 
do  F 


j-XxP)  I 
(x,o)  'o=a. 


(3.16) 


Since  both  f(k-x)  and  a"  g(k/a)  have  mlr,  it  follows  from  the  Basic  Composition 
Formula  of  Karlin  (1968,  p.  17)  that  h(x,a)  also  has  mlr.  Therefore,  since 

X  >  0,  the  sign  of  the  last  expression  in  (3.16)  is  negative,  showing  that 

any  interior  extremum  must  be  a  maximum.  We  therefore  have 

inf  P(,H«|x)  =  mindirn  P(HoIx),  lim  P(H-,lx)}  . 

0<a<“  cf^O  a-x» 

But  from  (3.14)  it  is  easily  verified,  using  I'Hopital's  rule,  that 

lim  P(Hq|x)  =  i,  lim  P(HqIx)  =  p(x)  <  -^  . 
a-KD 

Moreover,  since  we  obtain  the  same  infimum,  p(x),  regardless  of  the  choice 
of  geG,  we  have  that 

inf  P(HQfx)  =  inf  inf  P(Hq|x)  =  inf  p(x)  =  p(x)  .  □ 

TreGmy  geG  0<a<®  geG 

We  can  summarize  the  results  of  the  above  two  theorems,  and  the  relationship 
to  Gyj  in  the  following  corollary. 

Corollary  3.1 ;  For  the  hypotheses  in  (2.1),  if  x  >  0, 

inf  P(Hf,|x)  inf  P(H„|x)  =  inf  P(H,Jx)  =  p(x)  . 

TTeGyj  TreUg  ireG^y 

This  corollary  is  in  striking  contrast  to  the  results  of  Berger  and 
Sellke  (1985).  In  the  two  sided  problem  with  a  point  null  hypothesis,  they 
argued  that  using  impartial  prior  distributions  does  not  lead  to  any 
reconciliation  between  inf  P(Hglx)  and  p(x).  In  fact,  for  the  cases  they 
considered,  the  Bayesian  infimum  was  much  greater  that  p(x).  In  contrast, 
we  find  that  for  classes  of  reasonable,  impartial  priors,  such  as  Gj^jy,  we 
obtain  equality  between  inf  P(Hglx)  and  p(x),  showing  that,  in  fact,  p(x) 
is  a  conservative  measure  of  evidence  against  the  null  hypothesis. 


We  close  this  section  by  examining  two  important  special  cases.  In  the 


first  case  we  again  obtain  equality  between  p(x)  and  P(Hq]x). 

Theorem  3.5:  If  f(x-6)  »  (2-na  )  exp{-  ^x-e)  /a  ),  then  for  the  hypothesis  in 
(2.1),  if  X  >  0. 

inf  P(Hq|x)  =  p{x)  . 

Proof:  The  result  is  easily  established  by  noting 

I  2  ,  \ 

P(Hq|x)  =  P  Z  <  (-2^)'*(^)  .  Z  >  n(0,l) 

2 

which  attains  its  infimum  at  t  =  ®.  0 


We  next  consider  the  Cauchy  distribution,  to  again  examine  the  situation 

when  the  assumption  of  mlr  does  not  obtain.  For  the  class  U^,  the  symmetric 

2  -1 

uniform  distributions,  we  calculate  inf  P(Hq|x)  where  f(x-e)  =  [tt{1+(x-9)  )]"  . 
For  ir(9)  =  Uniform  (-k,k)  and  it  is  straightforward  to  calculate 


P(HJx)  =  tan-.|lxtk)  -  tan'^x) 
tan  (x+k)  -  tan"  (x-k) 


For  fixed  x  >  0,  P(HqIx)  is  not  monotone  in  k,  but  rather  attains  a  unique 
minimum  at  a  finite  value  of  k.  Table  2  lists  the  minimizing  values  of  k, 
inf  P(Hg|x),  and  the  p-value  for  selected  values  of  x. 

Examination  of  Table  2  shows  once  again  that  inf  P(Hg|x)  is  smaller  than 
p(x),  this  observation  held  true  for  more  extensive  calculations  that  are 
not  reported  here.  Therefore,  even  in  the  case  of  the  Cauchy  distribution, 
the  infimum  of  the  Bayesian  measure  of  evidence  is  smaller  than  the 
frequent! St  p-value. 


Table  2. 

P-values  and 
infimum  over 

inf  P(Hr,|x)  for 

X  ~  Cauchy 

X 

*^min 

P(x) 

inf  P(Hq 

.2 

2.363 

.437 

.429 

.4 

2.444 

.379 

.363 

.6 

2.570 

.328 

.306 

.8 

2.727 

.285 

.260 

1.0 

2.913 

.250 

.222 

1.2 

3.112 

.221 

.192 

1.4 

3.323 

.197 

.168 

1.6 

3.541 

.178 

.148 

1.8 

3.768 

.161 

.132 

2.0 

3.994 

.148 

.119 

2.5 

4.572 

.121 

.094 

3.0 

5.158 

.102 

.077, 

3.5 

5.746 

.089 

.065 

4.0 

6.326 

.078 

.056 

5.0 

7.492 

.063 

.044 

10.0 

13.175 

.032 

.020 

25.0 

29.610 

.013 

.007 

50.0 

56.260 

.006 

.004 
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4.  Biased  Prior  Distributions 

In  this  section  we  examine  two  cases  where  the  prior  distributions 
are  biased  toward  Hq,  and  begin  to  see  some  of  the  reasons  for  the  large 
discrepancies  between  Bayesian  and  frequentist  evidence  in  the  two-sided 
case. 

2  2 

Again  consider  Hq:  9  <_  0  vs.  :  6  >  0  where  X  -  n(9,a  ).a  known. 
Consider  the  class  of  priors 


2  2 
Go  =  {n(0rt»T  )  distributions,  e,.  <  0  (fixed),  0  <  x  < 
Sq  u  u  - 


The  class  Gg  is  clearly  biased  toward  Hq,  however,  if  we  calculate 

inf  P(Hq|x)  over  this  class  the  result  is  again  p(x). 

For  any  ttcGo  ,  it  is  easy  to  calculate 
®0 


P(Hq|x)  =  P(Z  <  -(• 


x/g 


rr  X  + 


a/x 

(7^ 


eo)) 


(4.1) 


p 

where  Z  ~  n(0,l).  For  x  >  0,  P(Hq[x)  is  a  decreasing  function  of  x  ,  so 

2 

the  infimum  is  attained  at  x  =  «: 


inf  P(H-lx)  =  P(2  <  -x/a)  =  p(x)  .  (4.2) 

TTCGo 

2 

The  effect  of  the  bias  for  Hq  is  diminished  at  x  increases,  resulting  in  a 
limit  which  is  independent  of  9q.  This  is  a  different  situation  from  the 
point-null  case,  where  the  prior  probability  on  the  point  null  is  unaffected 
by  any  limiting  operation. 

We  next  consider  a  family  of  priors  in  which  every  member  is  biased 
toward  Hq  by  the  same  amount.  Suppose  that  an  experimenter  is  willing  to 
assert,  for  every  k  >  0,  it  is  q  times  more  likely  that  0e(-k,O)  than 
0e(O,k).  This  belief  may  be  reflected  in  the  prior 


'/  V 


Tr(0)=j  T 

(  k(l+q) 


-k  <  9  <  0 


0  <  e  <  k 


(4.3) 


Let  Gq  denote  the  class  of  all  of  these  priors.  Then,  by  an  argument  similar 
to  that  used  in  Theorem  2.2,  if  f(x-0)  has  mlr  and  x  >  0,  for  testing 
HqI  6  £  0  vs.  :  6  >  0  we  have 


(4.4) 


The  quantity  in  (4.4)  is  greater  than  p(x)  if  q  >  1  (prior  biased  toward  Hq)  and 
less  than  p(x)  if  q  <  1  (prior  biased  toward  H-j).  Therefore,  (4.4)  is  a  very 
reasonable  measure  of  evidence,  taking  into  account  both  prior  beliefs  and  sample 
information.  However,  even  in  this  biased  case,  we  do  not  observe  the  same 
discrepancies  as  Berger  and  Sellke  did  in  the  point-null  problem.  For 
example,  we  might  ask,  "How  large  must  q  be  in  order  that  inf  P(HqIx)  is 
twice  as  large  as  p(x),"  in  order  to  get  some  idea  of  how  the  bias  for  Hq 
affects  the  measure  of  evidence.  For  p  =  .01,  .05,  .1,  and  various  values 
of  m,  we  can  solve  for  q  such  that  inf  P(Hq|x)  =  mp.  Some  values  are 
given  in  Table  3. 

For  small  m,  q  is  approximately  equal  to  m.  However  for  larger  values 
of  m,  q  increases  rapidly,  showing  that  the  prior  must  be  very  biased 
toward  Hq  in  order  to  achieve  a  large  increase  in  inf  P(Hq|x). 


2 


6 


8 


2i 


5.  Comments 


For  the  problem  of  testing  a  one-sided  hypothesis  in  a  location 
parameter  family,  it  is  possible  to  reconcile  evidence  between  the  Bayesian 
and  frequentist  approaches.  The  frequency  p-value  is,  in  many  cases,  an 
upper  bound  on  P(HqIx),  showing  that  it  is  possible  to  regard  the  p-value 
as  assessing  "the  probability  that  Hq  is  true."  Even  though  this  phrase  has 
no  meaning  within  frequency  theory,  it  has  been  argued  that  practitioners 
sometimes  attach  such  a  meaning  to  the  p-value.  The  results  in  this  paper 
show  that,  for  testing  a  one-sided  hypothesis,  such  a  meaning  can  be 
attached  to  the  p-value. 

The  discrepancies  observed  by  Berger  and  Sellke  in  the  two-sided 
(point  null)  case  do  not  carry  over  to  the  problems  considered  here.  This 
leads  to  the  question  of  determining  what  factors  are  crucial  in 
differentiating  the  two  problems.  It  seems  that  if  prior  mass  is  concentrated 
at  a  point  (or  in  a  small  interval),  then  discrepancies  between  Bayesian 
and  frequentist  measures  will  obtain.  In  fact,  Berger  and  Sellke  note 
that  for  testing  Hq:  9=0  vs.  9  >  0,  the  p-value  and  the  Bayesian 
infimum  are  quite  different.  (For  example,  for  X  ^  n(9,l),  an  observed 
X  =  1.645  will  give  a  p-value  of  .05,  while,  if  mass  ^  is  concentrated  at 
zero,  inf  P(HqIx  =  1.645)  =  .21). 

Seen  in  another  light,  however,  it  can  be  argued  that  placing  a  point 
mass  of  j  at  Hg  is  not  representative  of  an  impartial  prior  distribution. 

For  the  problem  of  testing  Hg:  9  <_  0  vs.  H^ :  9  >  0,  consider  priors  of  the 
form 


7r(9)  =  Trf,h(9)  +  (1  -  Trf,)g(9) 


(5.1) 
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where  ttq  is  a  fixed  number,  and  h(9)  and  g(e)  are  proper  priors  on 

(-<»,  0)  and  (0,  «)  respectively.  It  then  follows  that,  for  x  >  0, 

fO 


sup  P(Hf,|x)  =  sup 


TTgj  f(x-0)h(e)d0 


f(x-0)h(0)d9  +(1-ttq) 


f(x-0)g(0)d0 


(5.1) 


7rQf(x) 


r 


(5.2) 


■^of(x)+(l-irQ)J^f(x-9)g(0)de 


and  the  last  expression  is  equal  to  P(HqIx)  for  the  hypotheses  0  =  0 
vs.  H-j:  0>0  with  prior  tt(0)  =  ttq  if  0  =  0  and  tt(0)  =  (1  -  TTQ)g(0)  if  0  >  0. 
Thus,  concentrating  mass  on  the  point  null  hypothesis  is  biasing  the  prior 
in  favor  of  Hq  9s  much  as  possible  in  this  one-sided  testing  problem. 

The  calculation  in  (5.2)  casts  doubt  on  the  reasonableness  of  regarding 
TTg  =  j  as  impartial.  In  fact,  it  is  not  clear  to  us  if  any  prior  that 
concentrates  mass  at  a  point  can  be  viewed  as  an  impartial  prior. 


Therefore,  it  is  not  surprising  that  the  p-value  and  Bayesian  evidence 
differ  in  the  normal  example  given  above.  Setting  ttq  =  factually  reflects 
a  bias  toward  Hg,  which  is  reflected  in  the  Bayesian  evidence. 

To  a  Bayesian,  the  fact  that  evidence  can  be  reconciled  with  the  p-values 
allows  for  a  Bayesian  interpretation  of  a  p-value  and  the  possibility  of 
regarding  a  p-value  as  an  objective  assessment  of  the  probability  that  Hg  is 
true.  It  also,  to  a  Bayesian,  gives  the  p-value  a  certain  amount  of 
respectability.  To  a  frequentist,  the  p-value  (or  significance  level)  has 
long  been  regarded  as  an  objective  assessment  of  the  tenability  of  Hg,  an 
interpretation  that  survives  even  within  the  Bayesian  paradigm. 
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